perm filename COMP.WRU[206,JMC] blob
sn#005337 filedate 1971-01-05 generic text, type T, neo UTF8
00100 COMPUTER SCIENCE DEPARTMENT
00200 STANFORD UNIVERSITY
00300
00400
00500 CS 206 COMPUTING WITH SYMBOLIC EXPRESSIONS FALL 1970
00600
00700
00800 ADDITIONAL INFORMATION ABOUT THE THE COMPILER PROBLEM
00900
01000 1. Some facts about the PDP-10.
01100
01200 The following facts about the PDP-10 may be of use in writing
01300 the limited LISP compiler called for in the exercise:
01400
01500 The PDP-10 has a 36 bit word and an 18 bit address. In
01600 instructions and in accumulators used as index registers this is the
01700 right part of the word where the least significant bits in arithmetic
01800 reside.
01900
02000 There are 16 general registers which serve simultaneously as
02100 accumulators (receiving the results of arithmetic operations), index
02200 registers (modifying the nominal addresses of instructions to form
02300 effective addresses), and as the first 16 registers of memory (if the
02400 effective address of an instruction is less than 16, then the
02500 instruction uses the corresponding general register as its operand.
02600
02700 All instructions have the same format and are written for the
02800 LAP assembly program in the form
02900
03000 (<op name> <accumulator> <address> <index register>).
03100
03200 Thus %(MOVE 1 3 P)% causes accumulator %1% to receive the contents of
03300 a memory register whose address is %3+c(P), i.e. 3+<the contents of
03400 general register P. If the op name is followed by %@,then the memory
03500 reference is indirect, i.e. the effective addrress is the contents of
03600 the right half of the word directly addressed by the instruction
03700 (modified by the index register and indirect bit of that word). In
03800 the following description of some instructions useful in constructing
03900 the compiler, <ef> denotes the effective address of an instruction.
04100
04200 MOVE c(ac) ← c(<ef>)
04300 MOVEI c(ac) ← <ef>
04400 MOVEM c(<ef>) ← c(ac)
04500 HLRZ c(left half ac) ← right half of c(<ef>)
04600 HRRZ c(right half ac) ←c(right half of c(<ef>)
04700 (These two are used indirectly for %car% and %cdr% respectively.)
04800 ADD c(ac) ← c(ac) + c(<ef>)
04900 SUB c(ac) ← c(ac) - c(<ef>)
05000 JRST go to <ef>
05100 JUMPE if c(ac) = 0 then go to <ef>
05200 JUMPN if c(ac) ≠ 0 then go to <ef>
05300 CAME if c(ac) = c(<ef>) then <skip next
05400 CAMN if c(ac) ≠ 0 then <skip next instruction>
05500 PUSH c(c(right half of ac)) ← c(<ef)); the contents
05600 of each half of ac is increased by one and if
05700 the contents of the left half is then %0, a stack
05800 overflow interrupt occurs. (PUSH P ac) is
05900 is used in LISP to put c(ac) on the
06000 stack.
06100 POP c(<ef>) ←c(c(right half of ac)); the
06200 contents of each half of ac are then decreased
06300 by %1. This may be used for removing the
06400 top element of the stack. Thus (POP P 1) puts
06500 the top element of the stack in accumulator 1.
06600 The i-th element of the stack is obtained by
06700 (MOVE@ 1 i P).
06800 POPJ (POPJ P) is used for returning from a subroutine
06900
07000 These instructions are adequate for compiling basic LISP code
07100 with the addition of the subroutine calling pseudo-instrucion. (CALL
07200 n (E <subr)) is used for calling the LISP subroutine <subr> with %n%
07300 arguments. The convention is that the arguments will be stored in
07400 successive accumulators beginning with accumulator %1, and the result
07500 will be reurned in accumulator %1. In particular the functions
07600 %ATOM% and %CONS% are called with %(CALL 1 (E ATOM))% and (CALL 2 (E
07700 CONS)) respectively, and programs produced by you compiler will be
07800 called similarly when their names are referred to.
08000
08050 2. Code produced by LISP compilers.
08075
08100 I have written two compilers, the simple one called LCOM1
08200 which I have distributed, and more optimising compiler called LCOM4
08300 which I am still improving. Currently, LCOM4 produces about half as
08400 many instructions for a given function as does LCOM1. Besides these,
08500 there is the standard PDP-10 LISP compiler written at M.I.T. by
08600 Richard Greenblatt and Stuart Nelson and modified by Whitfield
08700 Diffie.
08500
08600 Consider the function %union% for giving the union of two
08700 unordered lists. We have
08900
09000 union(u,v) ← if n u then v else if a u ε v then
09100 union(d u,v) else a u . union(d u,v).
09200
09300 Its LISP definition is
09400
09500 (DE UNION (U V) (COND ((NULL U) V) ((MEMBER (CAR U) V)
09600 (UNION (CDR U) V)) (T (CONS (CAR U) (UNION (CDR U) V)))))
09700
09800 Here are the programs produced by the three compilers.
09900
10000 The basic compiler LCOM1 generates the following 46
10050 instructions:
10100 (LAP UNION SUBR)
10200 (PUSH P 1)
10300 (PUSH P 2)
10400 (MOVE 1 -1 P)
10500 (JUMPN 1 G0157)
10600 (MOVE 1 0 P)
10700 (JRST G0156)
10800 G0157
10900 (MOVE 1 -1 P)
11000 (HLRZ@ 1 1)
11100 (PUSH P 1)
11200 (MOVE 1 -1 P)
11300 (PUSH P 1)
11400 (SUB P (C 0 0 2 2))
11500 (MOVE 2 2 P)
11600 (MOVE 1 1 P)
11700 (CALL 2 (E MEMBER))
11800 (JUMPE 1 G0158)
11900 (MOVE 1 -1 P)
12000 (HRRZ@ 1 1)
12100 (PUSH P 1)
12200 (MOVE 1 -1 P)
12300 (PUSH P 1)
12400 (SUB P (C 0 0 2 2))
12500 (MOVE 2 2 P)
12600 (MOVE 1 1 P)
12700 (CALL 2 (E UNION))
12800 (JRST G0156)
12900 G0158
13000 (MOVE 1 -1 P)
13100 (HLRZ@ 1 1)
13200 (PUSH P 1)
13300 (MOVE 1 -2 P)
13400 (HRRZ@ 1 1)
13500 (PUSH P 1)
13600 (MOVE 1 -2 P)
13700 (PUSH P 1)
13800 (SUB P (C 0 0 2 2))
13900 (MOVE 2 2 P)
14000 (MOVE 1 1 P)
14100 (CALL 2 (E UNION))
14200 (PUSH P 1)
14300 (SUB P (C 0 0 2 2))
14400 (MOVE 2 2 P)
14500 (MOVE 1 1 P)
14600 (CALL 2 (E CONS))
14700 (JRST G0156)
14800 G0159
14900 G0156
15000 (SUB P (C 0 0 2 2))
15100 (POPJ P)
15200 NIL
15300
15400 LCOM4 produces the following program:
15500 (LAP UNION SUBR)
15600 (PUSH P 1)
15700 (PUSH P 2)
15800 (MOVE 1 -1 P)
15900 (JUMPN 1 G0157)
16000 (MOVE 1 0 P)
16100 (JRST 0 G0156)
16200 G0157
16300 (HLRZ@ 1 -1 P)
16400 (MOVE 2 0 P)
16500 (CALL 2 (E MEMBER))
16600 (JUMPE 1 G0158)
16700 (HRRZ@ 1 -1 P)
16800 (MOVE 2 0 P)
16900 (CALL 2 (E UNION))
17000 (JRST 0 G0156)
17100 G0158
17200 (HRRZ@ 1 -1 P)
17300 (MOVE 2 0 P)
17400 (CALL 2 (E UNION))
17500 (MOVE 2 1)
17600 (HLRZ@ 1 -1 P)
17700 (CALL 2 (E CONS))
17800 G0156
17900 (SUB P (C 0 0 2 2))
18000 (POPJ P)
18100 NIL
18200
18300 The standard PDP-10 LISP compiler produces the following
18400 code:
18600
18700 (LAP UNION SUBR)
18800 (PUSH P 1)
18900 (PUSH P 2)
19000 (JUMPN 1 G0002)
19100 (MOVE 1 2)
19200 (JRST 0 G0001)
19300 G0002 (HLRZ@ 1 1)
19400 (CALL 2 (E MEMBER))
19500 (JUMPE 1 G0003)
19600 (MOVE 2 0 P)
19700 (HRRZ@ 1 -1 P)
19800 (CALL 2 (E UNION))
19900 (JRST 0 G0001)
20000 G0003 (HLRZ@ 1 -1 P)
20100 (MOVE 2 0 P)
20200 (PUSH P 1)
20300 (HRRZ@ 1 -2 P)
20400 (CALL 2 (E UNION))
20500 (POP P 2)
20600 (CALL 2 (E XCONS))
20700 G0008
20800 G0001 (SUB P (C 0 0 2 2))
20900 (POPJ P)
21000 NIL
21100
21200
21300 Note that all three compilers produce the same first line of
21400 heading. This is necessary for the LAP assembly program which also
21500 requires the %NIL% at the end as punctuation. The standard compiler
21600 uses a function called %XCONS% that has the same effect as CONS%
21700 except that it receives its arguments in the reverse order which is
21800 sometimes convenient. This requires, of course, that the compiler be
21900 smart enough to decide where it is more convenient to put the
22000 arguments of the %cons% function. You may use %XCONS% if you like,
22100 but you may have better ways of using your time.
22200
22300 My two compilers are similar in structure, but the better
22400 one, which is twice as long, uses the following optimisations.
22500
22600 1. When the argument of CAR or CDR is a variable, it compiles
22700 a (HLRZ@ 1 i P) or (HRRZ@ 1 i P) which gets the result through the
22800 stack without first compiling the argument into an accumulator.
22900
23000 2. When it has to set up the arguments of a function in the
23100 accumulators, in general, the program must compute the arguments one
23200 at a time and save them on the stack, and then load the accumulators
23300 from the stack. However, if one of the arguments is a variable, is a
23400 quoted expression, or can be obtained from a variable by a chain of
23500 %cars% and %cdrs, then it need not be computed until the time of
23600 loading accumulators since it can be computed using only the
23700 accumulator in which it is wanted.
23800
23900 3. LCOM1 computes Boolean expressions badly and generates
24000 many unnecessary labels and JRSTs. LCOM4 is more sophisticated about
24100 this.
24200
24300 4. The Diffy compiler produces about 10 percent less code
24400 than LCOM4; the main difference seems to be that it sometimes notices
24500 when it has an expression that it will need later. Sometimes this
24600 feature leads it astray as in the above example wherein saves %a%u%
24700 for later use at greater cost than re-computing it.
24800
24900 It should further be noted that quoted S-expressions can be
25000 compiled with the instruction
25100
25200 (MOVEI 1 (QUOTE α)).
25300
25400 As someone pointed out in class, the compiler as distributed
25500 incorrectly used MOVE there.
25600
25700 You are welcome to pirate anything you want from
25800 LCOM1[206,JMC]. I particularly recomment the function COMPL itself
25900 that handles all the I-O in such a way as to produce a file
26000 acceptable to LAP.
26100
26200 3. Running the compilers.
26300
26400 If you want to run one of these compilers, say LCOM4, to see
26500 what sort of code it produces, here is the procedure:
26600
26700 1. Prepare a file <file name> without an extension containing
26800 the functions to be compiled in LISP S-expression form, i.e. (DE
26900 <function name> <variable list> <function body>).
27000
27100 2. Give the system command RU LCOM4[206,JMC]<cr>. LCOM4 will
27200 reply with *.
27300
27400 3. Give the RLISP command COMPL <file name>;<cr>. COMPL is
27500 an FEXPR so that <file name> does not have to be preceded by '.
27600 LCOM4 will respond with a list of the functions compiled and the
27700 lengths of the compiled programs exaggerating slightly since all
27800 expressions in the output list are counted. <control C> will get you
27900 back to the system and a file called <file name>.LAP has the compiled
28000 programs.
28100
28200 4. You can look at the program by TYPE <file name>.LAP<cr>,
28300 and if you want to try out the functions you can enter LISP by R
28400 LISP<cr> and after answering N to the ALLOC? question, you can read
28500 in the functions by (DSKIN (<file name>.LAP))<cr>, and they can be
28600 tested in the usual LISP way.